06_3_A_ESmoothing_Trended_Seasonal

library(fpp3)

References

  1. Hyndman, R.J., & Athanasopoulos, G. (2021). Forecasting: principles and practice (3rd ed.). https://otexts.com/fpp3/

  2. Fable package documentation. Retrieved from:

  3. Hyndman, R. J., Koehler, A. B., Ord, J. K., & Snyder, R. D. (2008). Forecasting with exponential smoothing: The state space approach. Springer-Verlag.

Note

This notebook is not original material, but rather based on ref. 1 with notes expanding on some of the concepts contained in the book.

Motivation

Extension of Holt’s Method to capture seasonality. It comprises the forecast equation and three smoothing equations for the following paramentrs

  • The level \(l_t\) (parameter \(\alpha\))
  • The trend \(b_t\) (parameter \(\beta*\))
  • The seasonal component \(s_t\) (parameter \(\gamma\))

The symbol \(m\) is the length of the seasonality period, i.e., the duration of seasons a seasonal period. For quarterly data \(m = 4\), for monthly data \(m= 12\), etc… etc…

Additive vs Multiplicative

There are two variations of the method.

Additive

  • The additive method is preferred when the seasonal variations do not change proportionally to the level of the series.
    • Seasonal component expressed in absolute terms in the scale of the observed series
    • Level equation: the series is seasonally adjusted by subtracting the seasonal component
    • Within a year the seasonal component will add up to approximately zero. See the session classical decomposition from scratch to understand why this is:

\[ \begin{align*} \text{For additive seasonal ETS:} && \sum_{i=1}^{m} s_i \approx 0 \end{align*} \] As a result, the seasonal component does not contribute to the average of the time series over one period.

Multiplicative

  • The multiplicative method is preferred when the seasonal variations change proportionally to the level of the series.
    • Seasonal component expressed in relative terms (percentages)
    • Level equation: the series is seasonally adjusted by dividing through by the seasonal component.
    • Within a year the seasonal component will sum up to approximately \(m\).. See the session classical decomposition from scratch to understand the reason.

\[ \begin{align*} \text{For multiplicative seasonal ETS:} && \sum_{i=1}^{m} s_i \approx m \end{align*} \]

Holt’s Winters’ additive method

Component form of the equations

Unlike for Simple Exponential Smoothing, we will not derive the equations. We will accept them and then interpret them further down below. For those interested, see reference 3.

Differences with trended exponential smoothing:

  • Introduce new parameter, the seasonal index \(s_t\).
    • This implies introducing a new equation for the seasonal index.
  • The seasonal index has its own smoothing parameter \(\gamma\).

Component form equations:

\[ \begin{align*} \text{Forecast equation}&& \hat{y}_{t+h|t} &= \ell_{t} + hb_{t} + s_{t+h-m(k+1)} \\ \text{Level equation}&& \ell_{t} &= \alpha(y_{t} - s_{t-m}) + (1 - \alpha)(\ell_{t-1} + b_{t-1})\\ \text{Trend equation}&& b_{t} &= \beta^*(\ell_{t} - \ell_{t-1}) + (1 - \beta^*)b_{t-1}\\ \text{Seasonal equation}&& s_{t} &= \gamma (y_{t}-\ell_{t-1}-b_{t-1}) + (1-\gamma)s_{t-m}, \end{align*} \]

\(k\) subindex in the seasonal component

The subindex \(k = int(\frac{h-1}{m})\) might be confusing.

  • \(int()\) is the function integer part. This function returns the integer part of any given real number. for example:
    • int(0.1) = 0
    • int(1.182341) = 1
    • int(2.154) = 2
  • \(h\) is the forecast horizon.
  • \(m\) is the length of the seasonal period. Some examples:
    • For quarterly data and a seasonal period of 1 year: \(m=4\)
    • For monthly data and a seasonal period of 1 year: \(m=12\)
    • For daily data and a seasonal period of 1 week: \(m=7\)
    • For weekly data and a seasonal period of 1 year: \(m=52\)

The figure below helps understand what the expression \(k = int(\frac{h-1}{m})\) attains. \(k\) is the number of seasonal periods between the current forecast horizon and the latest observed season. To simplify notation for the reader, the figure has been drawn considering a specific example with \(m=12\) (monthly data and a yearly seasonal period):

Equations for the fitted values

To understand how these equations help fit a model to a time series with both trend and seasonality, let us particularize the forecast equation for \(h=1\) (fitted values are one-step ahead forecasts on the training data). We will subsequently particularize the equation at time \(t\) instead of the resulting \(t+1\), to make it consistent with the level and trend equations.

  • NOTE for \(h=1\) we have \(k=int(\frac{h-1}{m}) = 0\). This is also considered in the equation below

\[ \begin{align*} \text{Forecast equation for h = 1} && \hat{y}_{t+1|t} &= \ell_{t} + 1\cdot b_{t} + s_{t+1-m} \\ \text{Particularize for t instead t+1} && \hat{y}_{t|t-1} &= \ell_{t-1} + 1\cdot b_{t-1} + s_{t-m} && (\text{fitted value at t})\\ \end{align*} \]

Therefore, the equations for the fitted values are:

\[ \begin{align*} \text{Fitted value at t} && \hat{y}_{t|t-1} &= \ell_{t-1} + b_{t-1} + s_{t-m} \\ \text{Level equation}&& \ell_{t} &= \alpha(y_{t} - s_{t-m}) + (1 - \alpha)(\ell_{t-1} + b_{t-1})\\ \text{Trend equation}&& b_{t} &= \beta^*(\ell_{t} - \ell_{t-1}) + (1 - \beta^*)b_{t-1}\\ \text{Seasonal equation}&& s_{t} &= \gamma (y_{t}-\ell_{t-1}-b_{t-1}) + (1-\gamma)s_{t-m}, \end{align*} \]

These equations allow us to determine the fitted values if given:

  1. The values of the smoothing parameters \(\alpha\), \(\beta^*\) and \(\gamma\)
  2. An initial value for the level \(l_{0}\)
  3. An initial value for the trend \(b_{0}\)
  4. Initial values for the seasonal indices of an entire initial season: \((s_0, s_{-1}, s_{-2}, \dots, s_{1-m})\)
    • Note that the previous vector consists of \(m\) seasonal indices, given as initial values.
    • The excel file in which you will do an example for quarterly data should help clarify this.

The process to compute the fitted values having this data is as follows:

  • For \(t = 1\):
    1. \(\hat{y}_{1|0} = \ell_{0} + b_{0} + s_{1+1-m}\)
    2. \(l_{1} = \alpha(y_{1}-s_{1-m}) + (1 - \alpha)(l_{0} + b_{0})\)
      • The value of \(l_1\) will be used in the trend equation next.
    3. \(b_{1} = \beta^*(l_{1} - l_{0}) + (1 -\beta^*)b_{0}\)
    4. \(s_1 = \gamma(y_t-l_{0}-b_{0})+(1-\gamma)s_{1-m}\)
  • For \(t = 2\): we proceed in the same manner after having computed \(l_1\) and \(b_1\) and \(s_1\) in the previous step.
  • For \(t = 3\): we proceed in the same manner after having computed \(l_2\) and \(b_2\) and \(s_2\) in the previous step.

Interpreting the equations

Let us briefly interpret the equations:

Level equation

Weighted average between the seasonally adjusted observation and the non-seasonal forecast

\[ \begin{align*} l_{t} &= \alpha\underbrace{(y_{t} - s_{t-m})}_{\substack{\textrm{seasonally adjusted} \\ \textrm{observation}}} + (1 - \alpha)\underbrace{(\ell_{t-1} + b_{t-1})}_\textrm{non seasonal forecast}\\ \end{align*} \]

Trend equation

identical to Holt’s linear additive method. See the previous session.

\[ \begin{align*} \text{Trend equation} && b_{t} &= \beta^*\underbrace{(\ell_{t} - \ell_{t-1})}_{\substack{\textrm{estimated trend} \\ \textrm{at time t}}} + (1 -\beta^*)\underbrace{b_{t-1}}_{\substack{\textrm{previous} \\ \textrm{estimate} \\ \textrm{of the trend}}}, \end{align*} \]

  • NOTE: \(\ell_{t} - \ell_{t-1}\) is an estimate of the trend because, if we take the level \(l_t\) as an estimate of the time series, then the difference between consecutive levels divided by 1 (the distance in timesteps between \(l_t\) and \(l_{t-1}\)) is a measure of the level at time t. The figure below should clarify this:

The seasonal equation

weighted average between an estimate of the current seasonal index and the seasonal index of the same observation for in the previous season (\(m\) periods ago).

\[ \begin{align*} s_{t} &= \gamma \underbrace{(y_{t}-\ell_{t-1}-b_{t-1})}_{\substack{\textrm{estimate of} \\ \textrm{current seasonal index}}} + (1-\gamma)\underbrace{s_{t-m}}_{\substack{\textrm{seasonal index} \\ \textrm{of the} \\ \textrm{same season} \\ \textrm{last year}}} \end{align*} \]

  • NOTE: \(y_{t}-\ell_{t-1}-b_{t-1}\) is an estimate the current seasonal index because we remove from the observation at time t the effect of the level and the slope at the previous step. This results in an estimate of the current seasonal index.

Fitted value equation

The fitted value at \(t\) is obtained just as in Holt’s method, only now the seasonal index of the matching observation in the previous session (\(s_{t-m}\)) is added.

\[ \begin{align*} \text{Fitted value at t} && \hat{y}_{t|t-1} &= \ell_{t-1} + 1 \cdot b_{t-1} + s_{t-m} \\ \end{align*} \]

Equation for the forecasts

Once the fitted values \(\hat{y_t}\), the levels \(l_t\), the slopes \(b_t\) and the seasonal indices \(s_t\) have been computed for every value of in the training region (from to \(t=1\) to \(t=T\)) we can compute the forecasts.

That is, once we have fitted the model the forecasts beyond the training region are given by:

\[ \begin{align*} \text{Forecast equation} && \hat{y}_{T+h|T} &= \ell_{T} + hb_{T} + s_{T+h-m(k+1)} \end{align*} \]

  • where \(t=T\) is the last point in time of the training data.
  • \(k = int(\frac{h-1}{m})\). See the previous sections for the interpretation of \(k\).

Note that only the last values \(l_T\), \(b_T\) and the last season of the seasonal indices \(s_{T+h-m(k+1)\) intervene in the forecasts.

You can see that the h-step ahead forecast is equal to the last estimated level plus h times the last estimated trend value plus the value of the seasonal index corresponding to the last estimated season

Holt Winter’s multiplicative Method

In the multiplicative variant of Holt-Winter’s method:

  1. The level \(l_t\) and the trend \(b_t\) relate in an additive manner.
  2. The seasonal index \(s_{t+h-m(k+1)}\) relates in a multiplicative manner with them both.

The resulting equations are equivalent to those of the additive method, but now the seasonal indices multiply if they were adding before and divide if the were being subtracted. This results in:

\[ \begin{align*} \hat{y}_{t+h|t} &= (\ell_{t} + hb_{t})s_{t+h-m(k+1)} \\ \ell_{t} &= \alpha \frac{y_{t}}{s_{t-m}} + (1 - \alpha)(\ell_{t-1} + b_{t-1})\\ b_{t} &= \beta^*(\ell_{t}-\ell_{t-1}) + (1 - \beta^*)b_{t-1} \\ s_{t} &= \gamma \frac{y_{t}}{(\ell_{t-1} + b_{t-1})} + (1 - \gamma)s_{t-m}. \end{align*} \]

The equation for the fitted values is obtained just as for the additive method:

  1. Particularize the forecast equation for \(h=1\).
  2. Particularize the result at \(t\) instead of \(t+1\)

This will not be done in this notebook. Feel free to do it as an exercise and try to reproduce the table for the fitted values corresponding to the multiplicative case in the next section to ensure you have done it correctly.

Fitting the models

Fitting the Holt-Winter’s method to a time series implies the following process to minimize the Sum of Squared Residuals (SSE). In this case, we are not going to implement this from scratch as we did for simple exponential smoothing. An understanding of the process is sufficient.

Importantly, we now require initial values for an entire season for the seasonal indices. This increases a lot the number of parameters involved.

Example for the additive case

In the table below, an example for the fitted values for Holt-Winters’ method with additive seasonality is computed. For that, the following initial values are required:

  • The level \(l_t\) of the series - estimated as the average of the observations in the first period (first four observations)
  • The slope of the series \(b_t\) - since the series send to have no trend within the first season, we estimate it as 0.
  • Since in this case \(m=4\), we require initial estimates of the first 4 seasonal components.
    • NOTE: a possible way to produce this estimates is as the first four elements of the seasonal component of a classical decomposition (additive) of the series.
    • NOTE that for monthly data we would require 12 estimates. With longer seasonal periods this becomes increasingly complex.
Quarter Time Observation Level Slope Season Forecast
t \(y_t\) \(l_t\) \(b_t\) \(s_t\) \(\hat{y}_{t+1|t}\)
1997 Q1 0 1.5
1997 Q2 1 -0.3
1997 Q3 2 -0.7
1997 Q4 3 9.8 0.0 -0.5
1998 Q1 4 11.8 9.9 0.0 1.5 11.3
1998 Q2 5 9.3 9.9 0.0 -0.3 9.7
1998 Q3 6 8.6 9.7 -0.0 -0.7 9.2
1998 Q4 7 9.3 9.8 0.0 -0.5 9.2
2017 Q1 80 12.4 10.9 0.1 1.5 12.3
2017 Q2 81 10.5 10.9 0.1 -0.3 10.7
2017 Q3 82 10.5 11.0 0.1 -0.7 10.3
2017 Q4 83 11.2 11.3 0.1 -0.5 10.6
\(h\) \(\hat{y}_{T+h}\)
2018 Q1 1 12.9
2018 Q2 2 11.2
2018 Q3 3 11.0
2018 Q4 4 11.2
2019 Q1 5 13.4
2019 Q2 6 11.7
2019 Q3 7 11.5
2019 Q4 8 11.7
2020 Q1 9 13.9
2020 Q2 10 12.2
2020 Q3 11 11.9
2020 Q4 12 12.2

Example for multiplicative case

In the table below, an example for the fitted values for Holt-Winters’ method with multiplicative seasonality is computed. For that, the following initial values are required:

The following table does the same but following a multiplicative approach:

  • The level \(l_t\) of the series - estimated as the average of the observations in the first period (first four observations)
  • The slope of the series \(b_t\) - since the series send to have no trend within the first season, we estimate it as 0.
  • Since in this case \(m=4\), we require initial estimates of the first 4 seasonal components.
    • NOTE: a possible way to produce this estimates is as the first four elements of the seasonal component of a classical decomposition (multiplicative) of the series.
    • NOTE that for monthly data we would require 12 estimates. With longer seasonal periods this becomes increasingly complex.
Quarter Time Observation Level Slope Season Forecast
\(t\) \(y_t\) \(l_t%\) \(b_t\) \(s_t\) \(y_{t+1}\)
1997 Q1 0 1.2
1997 Q2 1 1.0
1997 Q3 2 0.9
1997 Q4 3 10.0 -0.0 0.9
1998 Q1 4 11.8 10.0 -0.0 1.2 11.6
1998 Q2 5 9.3 9.9 -0.0 1.0 9.7
1998 Q3 6 8.6 9.8 -0.0 0.9 9.2
1998 Q4 7 9.3 9.8 -0.0 0.9 9.2
2017 Q1 80 12.4 10.8 0.1 1.2 12.6
2017 Q2 81 10.5 10.9 0.1 1.0 10.6
2017 Q3 82 10.5 11.1 0.1 0.9 10.2
2017 Q4 83 11.2 11.3 0.1 0.9 10.5
\(h\) \(\hat{y}_{T+h\vert T}\)
2018 Q1 1 13.3
2018 Q2 2 11.2
2018 Q3 3 10.8
2018 Q4 4 11.1
2019 Q1 5 13.8
2019 Q2 6 11.7
2019 Q3 7 11.3
2019 Q4 8 11.6
2020 Q1 9 14.4
2020 Q2 10 12.2
2020 Q3 11 11.7
2020 Q4 12 12.1

Exercise: reproduce these tables (excels provided by professor)

R Example: Domestic overnight trips in Australia

Model specification to fit a Holt-Winters model with additive seasonality:

In the ETS() formula:

  • Specify error("A") (additive errors)
  • Specify trend("A")
  • Specify season("A")

ETS(VARIABLE ~ error("A") + trend("A") + season("A"))

Model specification to fit a Holt-Winters model with multiplicative seasonality:

In the ETS() formula:

  • Specify error("M") (multiplicative errors)
  • Specify trend("A")
  • Specify season("M")

ETS(VARIABLE ~ error("M") + trend("A") + season("M"))

IMPORTANT NOTE: it is very important that if you specify season("M"), you also specify error("M"). Otherwise your model could face numerical instabilities and your computer could either never finish fitting the model or fail to fit the model.

Code

Let us consider the number of total trips to Australia for the purpose “Holiday”. We will fit both a Holt-Winter’s additive and a Holt-Winter’s multiplicative method to this data.

# Filter data
aus_holidays <- tourism %>%
  filter(Purpose == "Holiday") %>%
  summarise(Trips = sum(Trips)/1e3)

# Fit the models: additive and multiplicative seasonality
# Pay attention to the error terms! (for numerical stability):
  # error("A") for the additive model.
  # error("M") for the multiplicative model.
fit <- aus_holidays %>%
  model(
    additive = ETS(Trips ~ error("A") + trend("A") + season("A")),
    multiplicative = ETS(Trips ~ error("M") + trend("A") + season("M"))
  )

# Generate forecasts
fc <- fit %>% forecast(h = "3 years")

# Plot the forecasts
fc %>%
  autoplot(aus_holidays, level = NULL) +
  labs(title="Australian domestic tourism",
       y="Overnight trips (millions)") +
  guides(colour = guide_legend(title = "Forecast"))

Comparing the fit accuracy metrics, we see that the multiplicative model has a slightly smaller residuals RMSE (better fit). The difference, however, is not substantial.

  • NOTE: RMSE will be explained in the session on point forecasts.
fit %>% accuracy() %>% select(.model, RMSE) %>% mutate(RMSE = round(RMSE, 4))
# A tibble: 2 × 2
  .model          RMSE
  <chr>          <dbl>
1 additive       0.417
2 multiplicative 0.412

We can plot the components of the resulting ETS decomposition (fitted values)

fig_additive <- fit %>% 
  select("additive") %>% 
  components() %>%
  autoplot() +
  labs(title = "ETS(A, N, A) components")

fig_multiplicative <- fit %>% 
  select("multiplicative") %>% 
  components() %>%
  autoplot() +
  labs(title = "ETS(M, N, M) components")

# Used patchwork library syntax to organize plots
fig_additive + fig_multiplicative
Warning: Removed 4 rows containing missing values (`geom_line()`).
Removed 4 rows containing missing values (`geom_line()`).

These models already have a substantial number of parameters! This is because the models require a lot of initial estimates, that in the end are also a parameter. Let us examine them using tidy()

fit %>% tidy()
# A tibble: 18 × 3
   .model         term   estimate
   <chr>          <chr>     <dbl>
 1 additive       alpha  0.262   
 2 additive       beta   0.0431  
 3 additive       gamma  0.000100
 4 additive       l[0]   9.79    
 5 additive       b[0]   0.0211  
 6 additive       s[0]  -0.534   
 7 additive       s[-1] -0.670   
 8 additive       s[-2] -0.294   
 9 additive       s[-3]  1.50    
10 multiplicative alpha  0.224   
11 multiplicative beta   0.0304  
12 multiplicative gamma  0.000100
13 multiplicative l[0]  10.0     
14 multiplicative b[0]  -0.0114  
15 multiplicative s[0]   0.943   
16 multiplicative s[-1]  0.927   
17 multiplicative s[-2]  0.969   
18 multiplicative s[-3]  1.16    
  • The small value of \(\gamma\) for both models means that the seasonal component barely changes over time.
  • The small value of \(\beta*\) means that the slope component barely changes over time.
  • 3 smoothing parameters \(\alpha\), \(\beta*\) and \(\gamma\).
  • 6 initial estimates!! (1 for the level \(l_0\), one for the trend \(b_0\) and 4 for the seasonal comonent \(s_{-3}, s_{-2}, ..., s_{0}\). If the data were monthly we would require 12 estimates. For longer seasonal periods (e.g. weekly data and yearly seasonality) the problem becomes substantially complex!

Recall the famous quote by the great mathematician John von Neumann:

“With four parameters I can fit an elephant and with five I can make it wiggle his trunk” (John von Neuman, cited by Enrico Fermi in Nature 427).

Holt-Winters damped method

This is often the single most accurate method for seasonal data. The difference is that now we include a damped trend in the multiplicative method, in the same mannera as we did for additive models.

\[ \begin{align*} \hat{y}_{t+h|t} &= \left[\ell_{t} + (\phi+\phi^2 + \dots + \phi^{h})b_{t}\right]s_{t+h-m(k+1)} \\ \ell_{t} &= \alpha(y_{t} / s_{t-m}) + (1 - \alpha)(\ell_{t-1} + \phi b_{t-1})\\ b_{t} &= \beta^*(\ell_{t} - \ell_{t-1}) + (1 - \beta^*)\phi b_{t-1} \\ s_{t} &= \gamma \frac{y_{t}}{(\ell_{t-1} + \phi b_{t-1})} + (1 - \gamma)s_{t-m}. \end{align*} \]

Model specification to fit a Holt-Winters dampled model:

In the ETS() formula:

  • Specify error("M")
  • Specify trend("Ad")
  • Specify season("M")

ETS(VARIABLE ~ error("M") + trend("Ad") + season("M"))

R Example with daily data

  • m = 7 (seasonal period, 1 week)
sth_cross_ped <- pedestrian %>%
  filter(Date >= "2016-07-01",
         Sensor == "Southern Cross Station") %>%
  index_by(Date) %>%
  summarise(Count = sum(Count)/1000)

sth_cross_ped
# A tsibble: 184 x 2 [1D]
   Date       Count
   <date>     <dbl>
 1 2016-07-01 17.6 
 2 2016-07-02  2.52
 3 2016-07-03  1.73
 4 2016-07-04 17.3 
 5 2016-07-05 16.5 
 6 2016-07-06 16.8 
 7 2016-07-07 17.0 
 8 2016-07-08 17.2 
 9 2016-07-09  3.21
10 2016-07-10  1.75
# ℹ 174 more rows
fit <- sth_cross_ped %>%
  filter(Date <= "2016-07-31") %>%
  model(
    hw = ETS(Count ~ error("M") + trend("Ad") + season("M"))
  ) 

fc <- fit %>%
  forecast(h = "2 weeks")

fc %>% 
  autoplot(sth_cross_ped %>% filter(Date <= "2016-08-14")) +
  labs(title = "Daily traffic: Southern Cross",
       y="Pedestrians ('000)")

The model has identified the weekly seasonal pattern and the increasing trend at the end of the data.

The forecasts are a close match to the test data.

Exercise Seasonal Exp Smoothing

For this exercise use the quarterly number of arrivals to Australia from New Zealand, 1981 Q1 – 2012 Q3, from data set aus_arrivals.

nz_arrivals <- aus_arrivals %>% filter(Origin == "NZ")
nz_arrivals %>% autoplot()
Plot variable not specified, automatically selected `.vars = Arrivals`

1. Timeplot

Make a time plot of your data adjusting the grid to properly identify a yearly seasonality.

nz_arrivals %>% 
  autoplot(Arrivals / 1e3) +
  scale_x_yearmonth(date_breaks = "1 year", 
                  date_minor_breaks = "1 year",
                  date_labels = "%y") +
  theme(axis.text.x = element_text(angle = 90)) +
  labs(y = "Thousands of people")

Would you say that multiplicative seasonality is necessary?

The multiplicative seasonality is important in this example because seasonal
pattern increases in size proportionally to the level of the trend

2. Fit models to a training dataset

The code below creates a sub-slice of nz_arrivals that leaves out the last 8 observations. That is, it creates a training dataset nz_tr

nz_tr <- nz_arrivals %>%
  slice(1:(n() - 8))

Fit the following models to nz_tr:

  1. A Holt-Winter’s model with additive seasonality. Call it hw_additive
  2. A Holt-Winter’s model with mutliplicative seasonality. Call it hw_multipicative
  3. A Holt-Winter’s model with a damped trend and multiplicative seasonality. Call it hw_multipicative_damped
fit <- 
  nz_tr %>% 
    model(
      hw_additive = ETS(Arrivals ~ error("A") + trend("A") + season("A")),
      hw_multiplicative = ETS(Arrivals ~ error("M") + trend("A") + season("M")),
      hw_multiplicative_damped = ETS(Arrivals ~ error("M") + trend("Ad") + season("M")),
    )

3. Forecast 8 steps ahead

fc <- 
  fit %>% 
  forecast(h = 8)

fc
# A fable: 24 x 5 [1Q]
# Key:     Origin, .model [3]
   Origin .model            Quarter           Arrivals   .mean
   <chr>  <chr>               <qtr>             <dist>   <dbl>
 1 NZ     hw_additive       2010 Q4 N(312368, 1.8e+08) 312368.
 2 NZ     hw_additive       2011 Q1 N(235739, 2.3e+08) 235739.
 3 NZ     hw_additive       2011 Q2 N(294101, 2.8e+08) 294101.
 4 NZ     hw_additive       2011 Q3 N(334701, 3.4e+08) 334701.
 5 NZ     hw_additive       2011 Q4 N(317471, 4.8e+08) 317471.
 6 NZ     hw_additive       2012 Q1 N(240842, 5.3e+08) 240842.
 7 NZ     hw_additive       2012 Q2  N(3e+05, 5.9e+08) 299203.
 8 NZ     hw_additive       2012 Q3 N(339803, 6.4e+08) 339803.
 9 NZ     hw_multiplicative 2010 Q4   N(312454, 8e+08) 312454.
10 NZ     hw_multiplicative 2011 Q1 N(232812, 6.4e+08) 232812.
# ℹ 14 more rows

4. Graphs for the forecasts

Create the following graphs

  1. Forecasts along historical time series of all the models together without confidence intervals.
fc %>% 
  autoplot(nz_tr, level=FALSE)
Warning: Plot argument `level` should be a numeric vector of levels to display.
Setting `level = NULL` will remove the intervals from the plot.

  1. Forecasts along historical time series for holt-winters multiplicative method with damped trend, including confidence intervals.
fc %>% 
  filter(.model == "hw_multiplicative") %>% 
  autoplot(nz_tr)

  1. Forecasts along historical time series for holt-winters additive method.
fc %>% 
  filter(.model == "hw_additive") %>% 
  autoplot(nz_tr)

  1. Forecasts along historical time series for holt-winters multiplicative seasonality and damped trend:
fc %>% 
  filter(.model == "hw_multiplicative_damped") %>% 
  autoplot(nz_tr)

  1. Do you reach any conclusion?
For the methods with multiplicative seasonality and multiplicative errors, the
uncertainty in the prediction intervals increases with the level of the series.

This has to do with the underlying statistical hypothesis to compute these
confidence intervals.

Importantly, this DOES NOT MEAN THAT THE ADDITIVE METHOD IS BEST IN THIS CASE
because it has narrower confidence intervals. The additive method is simply
NOT APPROPRIATE for this data, because the data is multiplicative.

We do not have time to study the math related to multipicative errors in this 
course, what is relevant is that you know how to use them. The reader interested is
referred to seciton 8.5 of the fpp3 book:
  
https://otexts.com/fpp3/ets.html

Questions on point-forecast accuracy and cross-validation.

Use cross-validation to compare the performance of the models below with forecasts up to 10 steps ahead:

  • a Seasonal Naïve method, which we will use as benchmark. Call it snaive
  • an ETS model fitted to the data without transformation. It must include an additive trend and appropriate seasonality. Call this model ETS_original.
  • an ETS model fitted to the data without transformation. It must include an additive damped trend and appropriate seasonality. Call this model ETS_original_Ad.
  • an ETS model fitted to the log-transformed data. It must include an additive trend and appropriate seasonality. Call this model ETS_original_log.
  • an ETS model fitted to the log-transformed data. It must include an additive damped trend and appropriate seasonality. Call this model ETS_original_Ad_log.
  • a decomposition_model() that applies a default STL decomposition to the data and then uses:
    • a trended ETS for the seasonality adjusted data (only trend and no seasonality).
    • a seasonal ETS model for the seasonal component (only seasonality, no trend).
    • call this model dcmp_ETS

Regarding the training datasets, the smallest one must have 80% of the observations in the dataset and must grow in steps of 1 observation.

5. Create the training datasets using stretch_tsibble:

Apply stretch_tsibble to nz_arrivals

init_obs <- as.integer(nrow(nz_arrivals)*0.8)

nz_cv <-
  nz_arrivals %>% 
  stretch_tsibble(
    .init = init_obs,
    .step = 1
  )

# Number of tr datasets created
max(nz_cv$.id)
[1] 27

6. Fit the models to the training datasets

# For a box-cox transformation (will be explained)
lambda <- 
  nz_arrivals %>% 
  features(Arrivals, guerrero) %>% 
  pull(lambda_guerrero)

fit_cv <- 
  nz_cv %>%
  model(
    snaive = SNAIVE(Arrivals),
    ETS_original = ETS(Arrivals ~ error("M") + trend("A") + season("M")),
    ETS_original_Ad = ETS(Arrivals ~ error("M") + trend("Ad") + season("M")),
    ETS_original_log = ETS(log(Arrivals) ~ error("A") + trend("A") + season("A")),
    ETS_original_Ad_log = ETS(log(Arrivals) ~ error("A") + trend("Ad") + season("A")),
    dcmp_ETS = decomposition_model(
                  STL(log(Arrivals)), # STL requires additive data!
                  ETS(season_adjust ~ error("A") + trend("A") + season("N")),
                  ETS(season_year ~ error("A") + trend("N") + season("A"))
              ),
    
    # Additional models added, explained at the end. Box-cox transformed data
    ETS_original_bc = ETS(box_cox(Arrivals, lambda) ~ error("A") + trend("A") + season("A")),
    ETS_original_Ad_bc = ETS(box_cox(Arrivals, lambda) ~ error("A") + trend("Ad") + season("A")),
  )

7.

Perform forecasts of up to 10 steps ahead. Introduce a column h that indicates the horizon corresponding to each of the forecasts. Pay attention that, at the end of the process, the resulting object is still a fable (see session on cross-validation to understand how to do this).

fc_cv <-  
  fit_cv %>% 
  forecast(h=10) 

fc_cv <- 
  fc_cv %>% 
  group_by(.id, .model) %>%
  mutate(
    h = row_number()
  ) %>%
  ungroup() %>% 
  as_fable(response = "Arrivals", distribution = Arrivals) %>% 
  select(h, everything())

8.

Compute the cross-validated accuracy metrics for each model, averaged over all forecast horizons.

Provide an interpretation of the MAE of the best model.

# For each model, avaeraged over all forecast horizons
summary_cv <- 
  fc_cv %>% 
  accuracy(nz_arrivals) %>% 
  arrange(RMSE)
Warning: The future dataset is incomplete, incomplete out-of-sample data will be treated as missing. 
10 observations are missing between 2012 Q4 and 2015 Q1
summary_cv
# A tibble: 8 × 11
  .model      Origin .type      ME   RMSE    MAE     MPE  MAPE  MASE RMSSE  ACF1
  <chr>       <chr>  <chr>   <dbl>  <dbl>  <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl>
1 snaive      NZ     Test    6937. 14630. 12304.  2.42    4.24 0.838 0.771 0.231
2 ETS_origin… NZ     Test    -484. 15670. 12284. -0.0873  4.23 0.837 0.826 0.456
3 ETS_origin… NZ     Test   -5651. 15763. 12008. -1.89    4.11 0.818 0.830 0.348
4 ETS_origin… NZ     Test    3346. 16441. 13128.  1.25    4.54 0.895 0.866 0.485
5 ETS_origin… NZ     Test   -1585. 17559. 14002. -0.648   4.80 0.954 0.925 0.406
6 ETS_origin… NZ     Test  -10056. 20199. 16240. -3.59    5.61 1.11  1.06  0.355
7 ETS_origin… NZ     Test  -17629. 25681. 19654. -6.06    6.76 1.34  1.35  0.421
8 dcmp_ETS    NZ     Test  -17367. 25892. 20098. -5.96    6.89 1.37  1.36  0.393
The MAE of the snaive indicates that our forecasts 
are on average 14629.67 units off. 

* Those units are "persons" (arrivals toaustralia from New Zeland).
* The average is taken over all training datasets and over all forecast horizons.

9.

Compute the cross-validated accuracy metrics for each model and forecast horizon.

Afterwards, create a table that contains only the best model in terms of RMSE for each forecast horizon.

Finally create a figure that shows which model is best for each forecast horizon in terms of RMSE.

# FOR THIS QUESTION IN PARTICULAR I AM GOING TO EXTEND THE FORECAST UP TO h=20
# TO PROVE A POINT (READ SOLUTION)
fc_cv <-  
  fit_cv %>% 
  forecast(h=20) 

fc_cv <- 
  fc_cv %>% 
  group_by(.id, .model) %>%
  mutate(
    h = row_number()
  ) %>%
  ungroup() %>% 
  as_fable(response = "Arrivals", distribution = Arrivals) %>% 
  select(h, everything())
# Computation for each model and forecast horizon separately:
summary_cv_h <- 
  fc_cv %>% 
  accuracy(nz_arrivals, by = c(".model", "h")) %>% 
  arrange(h, RMSE)
Warning: The future dataset is incomplete, incomplete out-of-sample data will be treated as missing. 
20 observations are missing between 2012 Q4 and 2017 Q3
# Code to extract the best model for each forecast horizon
summary_cv_h %>% 
  group_by(h) %>% 
  slice(1)
# A tibble: 20 × 11
# Groups:   h [20]
   .model         h .type     ME   RMSE    MAE     MPE  MAPE  MASE RMSSE    ACF1
   <chr>      <int> <chr>  <dbl>  <dbl>  <dbl>   <dbl> <dbl> <dbl> <dbl>   <dbl>
 1 snaive         1 Test   4133. 12679. 10498.  1.44    3.59 0.715 0.668  0.312 
 2 snaive         2 Test   4331. 12929. 10886.  1.51    3.72 0.742 0.681  0.283 
 3 snaive         3 Test   5282. 12644. 10569   1.82    3.62 0.720 0.666  0.318 
 4 ETS_origi…     4 Test  -2728. 12108.  8218. -0.975   2.82 0.560 0.638  0.318 
 5 snaive         5 Test   8553. 16161. 13418.  2.98    4.63 0.914 0.851  0.111 
 6 snaive         6 Test   7490. 15108. 12587.  2.63    4.36 0.858 0.796  0.213 
 7 ETS_origi…     7 Test   -549. 15027. 12566. -0.133   4.48 0.856 0.792  0.273 
 8 snaive         8 Test   7861. 15568. 12980.  2.76    4.51 0.885 0.820  0.237 
 9 snaive         9 Test  10557. 16840. 15059.  3.70    5.24 1.03  0.887  0.0686
10 snaive        10 Test  10513. 17111. 15280.  3.68    5.31 1.04  0.901  0.0765
11 ETS_origi…    11 Test    160. 15061. 13190.  0.0449  4.64 0.899 0.793  0.268 
12 ETS_origi…    12 Test   -819. 14245.  9821. -0.290   3.40 0.669 0.750  0.354 
13 ETS_origi…    13 Test  -5156. 17182. 13321. -2.00    4.55 0.908 0.905  0.107 
14 ETS_origi…    14 Test  -4264. 17041. 13952. -1.12    4.71 0.951 0.898  0.127 
15 ETS_origi…    15 Test  -1898. 15716. 12998. -0.639   4.46 0.886 0.828  0.112 
16 ETS_origi…    16 Test  -2554. 18430. 12603. -0.839   4.21 0.859 0.971  0.299 
17 ETS_origi…    17 Test  -9550. 22610. 14769. -3.37    5.02 1.01  1.19   0.0340
18 ETS_origi…    18 Test  -6981. 19078. 15719. -1.96    5.18 1.07  1.01   0.227 
19 ETS_origi…    19 Test  -4916. 18236. 16605. -1.62    5.56 1.13  0.961 -0.0458
20 ETS_origi…    20 Test  -7875. 22748. 18526. -2.38    6.11 1.26  1.20   0.181 
# Code to compute a graph showing which model is best for each forecast
# horizon
summary_cv_h %>% 
  ggplot(aes(x = h, y = RMSE, color = .model)) +
  geom_point() + 
  geom_line() +
  scale_x_continuous(
    breaks = seq(1, max(summary_cv_h$h)),
    minor_breaks = seq(1, max(summary_cv_h$h))
  )

The trend of the original data is quite mild. For small values of h, this trend
does not have a noticeable effect and, surprisingly, it appears that the snaive
outperforms the rest of the models for small forecast horizons (except perhaps
for h = 4).

However, as h increases, the effect of the trend becomes relevant and the models
that include trend start performing better (smaller RMSE). Remember that the
SNAIVE model does not account for the effect of the trend.

This is apparent from h = 11 onwards.

10.

Take your best model (on average over all horizons), fit it to the data and analyze its residuals.

# I WILL FIT MORE THAN 1 MODEL AND COMPARE, YOU COULD LIMIT YOURSELVES TO 1
fit <- 
  nz_arrivals %>% 
  model(
      snaive = SNAIVE(Arrivals),
      ETS_original = ETS(Arrivals ~ error("M") + trend("A") + season("M")),
      
      # Explained later why we fit these
      ETS_original_Ad_bc = ETS(box_cox(Arrivals, lambda) ~ error("A") + trend("Ad") + season("A")),
      ETS_original_Ad_log = ETS(box_cox(Arrivals, lambda) ~ error("A") + trend("Ad") + season("A"))
  )

fit %>% 
  select(snaive) %>% 
  gg_tsresiduals()
Warning: Removed 4 rows containing missing values (`geom_line()`).
Warning: Removed 4 rows containing missing values (`geom_point()`).
Warning: Removed 4 rows containing non-finite values (`stat_bin()`).

model_vals <- 
  fit %>% 
  augment() %>% 
  filter(.model == "snaive")

# Mean
mean_innov <- mean(model_vals$.innov, na.rm = TRUE)
print(paste0("Mean of residuals: ", round(mean_innov, 2)))
[1] "Mean of residuals: 7437.15"
# qq-pot and box-plot
p1 <- ggplot(model_vals, aes(sample = .innov))
p1 <- p1 + stat_qq() + stat_qq_line()

# Generate box-plot
p2 <- ggplot(data = model_vals, aes(y = .innov)) +
      geom_boxplot(fill="light blue", alpha = 0.7) +
      stat_summary(aes(x=0), fun="mean", colour= "red") # Include the mean

# To run this, you need the libbrary "patchwork"
p1 + p2
Warning: Removed 4 rows containing non-finite values (`stat_qq()`).
Warning: Removed 4 rows containing non-finite values (`stat_qq_line()`).
Warning: Removed 4 rows containing non-finite values (`stat_boxplot()`).
Warning: Removed 4 rows containing non-finite values (`stat_summary()`).
Warning: Removed 1 rows containing missing values (`geom_segment()`).

Residuals of the seasonal naive behave suprisingly well in terms of normality
and homoskedasticity.

They are, however, correlated. Which means they have not captured properly the
time structure of the data.

Nonetheless, they lead to the forecast errors with smallest error on average
(cross-validation).

Be careful though, because this model would be very sensitive to a small change 
in the structure of the data. 

In fact all the models are sensitive to this because they are based only on 
past values of the data, but the seasonal naive is particularly ensitive because 
it only considers the last season.
fit %>% 
  select(ETS_original) %>% 
  gg_tsresiduals()

model_vals <- 
  fit %>% 
  augment() %>% 
  filter(.model == "ETS_original")

# Mean
mean_innov <- mean(model_vals$.innov, na.rm = TRUE)
print(paste0("Mean of residuals: ", round(mean_innov, 2)))

# qq-pot and box-plot
p1 <- ggplot(model_vals, aes(sample = .innov))
p1 <- p1 + stat_qq() + stat_qq_line()

# Generate box-plot
p2 <- ggplot(data = model_vals, aes(y = .innov)) +
      geom_boxplot(fill="light blue", alpha = 0.7) +
      stat_summary(aes(x=0), fun="mean", colour= "red") # Include the mean

# To run this, you need the libbrary "patchwork"
p1 + p2
Uncorrelated residuals
Mean 0 (onbiased residuals)

Deviation from normality (right skewed, visible in qqplo, box-plot and qq-plot)
Heteroskedastic residuals, before 1996 (approx), residuals have a higher variance.

Prediction intervals will therefore not be very reliable. 

If we examine the object `summary_cv`, we see that, in terms of MAPE (percent terms),
the first four models are very similar. We could therefore see if any of those
models has better residuals, sacrificing some accuracy for better residual
behavior, since the gain in accuracy is in this case (millions of visitors vs
few thousand improvement in accuracy) perhaps not so relevant. Better behavior
in terms of normality and homoskedasticity would lead to more reliable conf.
intervals.

Let us insect the models with Ad trend and a box-cox and log transformations.

We had seen that these transformations help mitigate heteroskedasticity problems
that stem from multiplicative seasonality.
# Examine residuals of ETS_original_Ad_bc
fit %>% 
  select(ETS_original_Ad_bc) %>% 
  gg_tsresiduals()

model_vals <- 
  fit %>% 
  augment() %>% 
  filter(.model == "ETS_original_Ad_bc")

# Mean
mean_innov <- mean(model_vals$.innov, na.rm = TRUE)
print(paste0("Mean of residuals: ", round(mean_innov, 2)))
[1] "Mean of residuals: 0.52"
# qq-pot and box-plot
p1 <- ggplot(model_vals, aes(sample = .innov))
p1 <- p1 + stat_qq() + stat_qq_line()

# Generate box-plot
p2 <- ggplot(data = model_vals, aes(y = .innov)) +
      geom_boxplot(fill="light blue", alpha = 0.7) +
      stat_summary(aes(x=0), fun="mean", colour= "red") # Include the mean

# To run this, you need the libbrary "patchwork"
p1 + p2
Warning: Removed 1 rows containing missing values (`geom_segment()`).

Uncorrelated residuals
Mean 0 (unbiased residuals)

Much better normality than ETS_original
Less heteroskedasticity than ETS_original.

Better prediction intervals with comparable accuracy (in percent terms).
# Examine residuals of ETS_original_Ad_log
fit %>% 
  select(ETS_original_Ad_log) %>% 
  gg_tsresiduals()

model_vals <- 
  fit %>% 
  augment() %>% 
  filter(.model == "ETS_original_Ad_log")

# Mean
mean_innov <- mean(model_vals$.innov, na.rm = TRUE)
print(paste0("Mean of residuals: ", round(mean_innov, 2)))
[1] "Mean of residuals: 0.52"
# qq-pot and box-plot
p1 <- ggplot(model_vals, aes(sample = .innov))
p1 <- p1 + stat_qq() + stat_qq_line()

# Generate box-plot
p2 <- ggplot(data = model_vals, aes(y = .innov)) +
      geom_boxplot(fill="light blue", alpha = 0.7) +
      stat_summary(aes(x=0), fun="mean", colour= "red") # Include the mean

# To run this, you need the libbrary "patchwork"
p1 + p2
Warning: Removed 1 rows containing missing values (`geom_segment()`).

Uncorrelated residuals
Mean 0 (unbiased residuals)

Much better normality than ETS_original
Less heteroskedasticity than ETS_original.

Better prediction intervals with comparable accuracy (in percent terms, MAPE).